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Abstract. In model-based recognition a number of ad hoc techniques are used to 
decide whether or not a match of data to a model is correct. Generally an empirically 
determined threshold is placed on the fraction of model features that must be matched. 
In this paper we present a more rigorous approach in which the conditions under which to 
accept a match are derived based on fundamental grounds. We obtain an expression that 
relates the probability of a match occurring at random to the fraction of model features 
that are accounted for by the match. This expression is a function of the number of model 
features, the number of image features, and a bound on the degree of sensor noise. 

One implication of our analysis is that a proper threshold for matching must vary 
with the number of model and data features. Thus it is important to be able to set the 
threshold as a function of a particular matching problem, rather than setting a single 
threshold based on experimentation. We analyze some existing recognition systems and 
find that our method yields thresholds similar to the ones that were determined empirically 
for these systems, providing evidence of the validity of the technique. 



Acknowledgments: This report describes research done at the Artificial Intelligence Labo- 
ratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial 
intelligence research is provided in part by an Office of Naval Research University Research 
Initiative grant under contract N000 14-86- K-0685, and in part by the Advanced Research 
Projects Agency of the Department of Defense under Army contract number DACA76- 
85-C-0010 and under Office of Naval Research contract N00014-85-K-0124. WELG is 
supported in part by the Matsushita Chair of Electrical Engineering. 

©Massachusetts Institute of Technology and Cornell University 1989. 



1 Department of Computer Science, Cornell University, Ithaca NY 



1. Introduction 

A central problem in machine vision is that of recognizing partially occluded ob- 
jects from noisy data. Recognition systems generally search for a matching between 
elements of an object model and instances of those elements in the data, recovering 
a transformation that maps part of the model onto part of the image. There are 
a number of different approaches to this model-based recognition problem, includ- 
ing clustering in parameter space (e.g., Stockman [1987], Stockman et al. [1982], 
Thompson and Mundy [1987]), searching a tree of corresponding model and image 
features (e.g., Grimson [1989a, 1989b] Grimson and Lozano-Perez [1984, 1987], Et- 
tinger [1987, 1988], Murray [1987a, 1987b], Murray and Cook [1988], Ayache and 
Faugeras [1986], Faugeras and Hebert [1986], Ikeuchi [1987]), and directly searching 
for possible transformations from a model to an image (e.g., Fischler and Bolles 
[1981], Huttenlocher and Ullman [1987, 1988]) (see also Chin and Dyer [1986] and 
Besl and Jain [1985] for more comprehensive reviews). These approaches all share 
the common property that a decision is made about the presence or absence of an 
object on the basis of geometric evidence acquired from the sensory input. In this 
paper we investigate the nature of this decision process and develop a formal means 
for deciding when a match should be accepted as correct. 

To determine what constitutes an acceptable match of a model to an image, 
most recognition systems use one of two ad hoc approaches. The first approach is to 
find all possible interpretations and order them by some measure of completeness, 
such as the percentage of the model accounted for. The best interpretations, as 
defined by this measure, are then taken as correct solutions. Suppose one is looking 
for interpretations in the data of a particular object from the library of possible 
objects. If an instance of a particular object model is present in the scene and the 
measure of completeness is well behaved, then this approach will correctly find the 
interpretations. If no instance of a particular object model is present in the scene, 
the interpretations of this object that best account for the data are in fact incorrect. 
In this case, one must either accept false interpretations or there must be some 
means of deciding whether or not the object is present. Furthermore, this approach 
is computationally expensive, as in order to find all possible interpretations the 
entire search space must be accounted for. 

The second common approach is to again apply a measure of completeness to 
each hypothesized match, but to use this measure to prematurely terminate the 
search as soon as an interpretation is found whose measure exceeds some threshold. 
Termination can be based strictly on the completeness of the current interpretation, 
or can involve examining the data for additional confirming or refuting evidence. 
Finding additional evidence can increase the measure of completeness of an inter- 
pretation, but one is still left with the problem of deciding whether an interpretation 
is good enough to accept as a correct match. 

Current methods for deciding whether a match is correct are based on empiri- 
cally determined thresholds. A more rigorous approach would be to derive conditions 



under which to accept a match that are based on fundamental grounds. In this paper 
we analyze the problem of determining what constitutes a good match of a model 
to an image. In particular we derive an expression that relates the probability of a 
false match to the fraction of model features that are accounted for by the match. 
This expression is a function of the number of model features, the number of image 
features, and a bound on the degree of tolerable sensor noise. The derivation results 
from an examination of the likelihood of false positives (i.e., interpretations that are 
incorrect but arise due to a random coincidence of events in the image). 

We then use this relation to define a threshold on the fraction of features that 
must be matched in order to limit the probability of a random coincidence to some 
level. We analyze some existing recognition systems ([Grimson and Lozano-Perez, 
1984, 1987] [Ayache and Faugeras, 1986] and find that our technique yields thresh- 
olds similar to the ones that were determined empirically for these systems. This 
provides experimental evidence of the validity of the technique, and suggests that it 
can be used profitably to set thresholds for other recognition tasks and systems. 

Specifically, we address the following question: 

• Suppose that we are given a model with m features, a set of s data features 
from a sensor, and bounds e p and e a on the positional and orientational error 
in the data. Further, suppose that some recognition method has found a match 
that accounts for a fraction / (/ € [0, 1]) of the m model features. What is 
the relation between / and the likelihood 6 that such a match can occur at 
random? 

We use this relation to set a threshold on the minimum fraction of model features 
that must be matched, /o, such that the likelihood of such a match occurring at 
random is small (e.g., 6 < .001). Note that there is not necessarily a value of /o for 
any choice of 6 (in particular as S gets very small, or as m, s, e p or e a get very large 
there may be no fraction of model features that limits the probability of a random 
match to 6). 

There are three basic steps to the technique. First, given a particular type 
of feature, the type of transformation from a model to an image, and a bound 
on the sensor error, we characterize the set of transformations that are consistent 
with a single pairing of model and image features. This set of transformations is 
described by a volume V in the transformation space (a d-dimensional space with 
one dimension corresponding to each of the d transformation parameters). 

We then determine the probability, Pr{e > 1} that the number of events (in 
this case the number of such volumes) that intersect at a common point in the 
transformation space is at least /. This likelihood is an estimate of how often a 
match of / features will occur at random. The probability of / volumes intersecting 
is estimated by considering the limiting case of a statistical occupancy problem as 
the number of observations and cells goes to infinity [Feller, 1968]. This method is 
similar to that used for the analysis of the generalized Hough transform in [Grimson 
and Huttenlocher, 1989]. 



Finally, the probability that / volumes will intersect at random is used to set a 
threshold on the minimum fraction of model features, /o, that must be matched in 
order to accept an interpretation. In particular we set the threshold /o such that 
ra/o < /, and Pr{e > /} is a tolerable false matching rate S. 

2. The Space of Transformations 

For rigid objects, the pose of an object with respect to a sensor can be character- 
ized by a transformation from the model to the sensor coordinate systems. In this 
paper we focus on the case where this transformation is a similarity (i.e., consist- 
ing of translation, rotation, and scaling). The set of possible solutions to a given 
recognition problem can be viewed as a transformation space having one dimension 
corresponding to each parameter of the transformation from model to sensor coor- 
dinates. A point in this transformation space defines a pose of an object, which 
in turn defines a possible solution to the recognition problem. For example, with 
a two-dimensional image and world, the transformation space is four-dimensional 
(translation in x and y, rotation in the plane, and scaling). 

A matching of a model feature with an image feature (such as an edge or vertex) 
defines a range of possible transformations from the model to the image, that is, a 
volume in the transformation space. The size and shape of this volume depends on 
the type of feature and on the degree of accuracy in the measurement of the features. 
In this section we present an analytic expression for the size of this volume. This 
expression is related to that developed in [Grimson and Huttenlocher, 1989] for 
characterizing the range of feasible transformations when the transformation space 
is tesselated at some sampling rate. Here, we determine an expression for the volume 
of feasible transformations in a continuous transformation space. 

In this section we limit the discussion to the case of two-dimensional matching 
problems where the transformation is an isometry (translation and rotation without 
scaling), and the features are linear edge fragments. The method also applies to 
point features, as discussed at the end of the section. A similar analysis holds for 
three-dimensional matching problems and for problems involving change of scale, as 
described in the appendix. 

Consider the problem of recognizing a two-dimensional polygonal model from 
noisy, occluded data. If M is the model coordinate system, we let 

Mj be the vector to the midpoint of the J th model edge, measured in M, 

Tj be the unit tangent of the edge, measured in M. , 
Lj be the length of the edge. 

We let mj,tj,^j denote similar parameters for the j th data edge, measured in the 
sensor based coordinate system, J. (Note that we use upper case characters to 
distinguish model parameters and lower case characters to distinguish sensory data 
parameters.) 



The transformation from model coordinates to sensor coordinates may be rep- 
resented by 

v s = Re Vm + V 
where Vm is a vector in model coordinates, Re is a rotation matrix corresponding 
to an angle of 0, Vo is a translation offset, and v a is the corresponding vector in 
sensor coordinates. 

We need to know what transformations will map a model edge to a data edge. 
First, if £j > Lj, we assume that the two edges cannot match. Thus, suppose that 
tj < Lj. Then the rotation needed to align the two tangents is given by the angle 
9 m between Tj and tj, and this defines a rotation matrix Re m - If we apply this 
rotation to the set of edge points 



we get a set of transformed points 
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To align the edges, we need to translate these rotated points. Now, because 
tj 5: Lj, there are many transformations that will cause the edges to overlap. 
Consider one endpoint of the data edge 

Pi = m j - -g *;■ 

If this happens to coincide with a model edge endpoint, 
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so that the translation is 
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because Re m Tj = ij. Similarly, if the other endpoints align, we get 

Lj - lj 
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■Re m ?J- 



Because any intermediate position is also acceptable, the set of translations consis 
tent with matching model edge J to data edge j is given by 

Lj - tj Lj-l 



| m i 



J** m Mj + 7fl 8m Tj 



ie 



(i) 



2 ' 2 

Hence, matching model edge J to data edge j yields a set of points in transform 
space V, with a single value for the rotation parameter and a set of values for the 
translation, that correspond to a line of length Lj — tj, with orientation Re m Tj in 
the x-y plane. This is shown in Figure 1. 

This, however, ignores the issue of noise in the measurements. In practice, we 
may only know the position of the endpoints of the data edge to within some ball 




Figure 1. Range of feasible translations, for fixed and with no position error. The line 
in the direction of i?Tj denotes the set of feasible translations for a given value of 0. 



(which in two dimensions is just a circle) of radius e p , and the orientation to within 
an angular error of e . For the case of two dimensional lines, these error ranges 
are related. Given endpoint variations of e p , it is straightforward to show that the 
maximum angular variation occurs when the correct line is tangent to both circles 
of radius e p about the two endpoints, and is given by 



e a = tan" 
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provided I > 2e p . 

Inclusion of error effects on position measurements imply that the line of feasible 
translations, for a given rotation, (as given by equation (1)), must be expanded to 
include any points in the parameter space within e p of that line. Further, this 
expansion into a region must be repeated for each value of in [0 m — e > #m + € a]- 
Note that this carves out a skewed volume in transform space, because the region's 
center and orientation are functions of 6 (see equation (1)). This observation has 
been carefully analyzed in [Clemens, 1986]. The volume is illustrated in Figure2. 

Thus, given Mj,Tj,Lj,m.j,ij,£j, we will use the following conditions: 

• If £j — 2e p > L j, then there are no consistent transformations, 

• Otherwise, the set of feasible transformations is denoted by the volume 

V(j,J) = (J S(0,j,J) 

0€[0m-£a,0m + £„] 



where an individual set of translations is denoted by: 
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Figure 2. Range of feasible translations, with error. The region enclosed in solid lines 
indicates the slice S(9,j,J) for a particular value of 6. As 6 varies, this slice rotates 
through a helical path, as indicated by the region enclosed by a dashed line. 

We can use this expression to determine the size of the set of feasible transformations. 
Since each slice S(9,j,J) consists of two hemicircles and a rectangle, it is easy to 
show that the volume of the region defined above is given by 

CjJ = 2e a [2e p (Lj - ij) + ?re p ] . 

The term in square braces corresponds to the area of a single slice, this is integrated 
over a range of angles, yielding the 2e a term. 

For simplicity, we let the data edge have a length 

tj = (1 - otjj)Lj 
where ajj denotes the amount of occlusion of the edge, so that the expression for 
the volume becomes 

cjj - 2e a [2€ p ajjLj + w€ 2 p ] . (2) 

If we are dealing with point features, rather than extended edges, the above 
result can be specialized. Here Lj — ► so that equation (2) becomes 



cjj = 2e a -Kei 



(2a) 



Now, what is e a in the case of a point feature? If the feature is a vertex, one can use 
the direction of the bisector of the two edges defining the vertex as the orientation 
of the vertex, and hence can bound the error in measuring that orientation as e a . 
Similarly, if the vertex is a curvature extremum or a point of curvature inflection, 
one can use the local tangent of the curve to define the orientation, and e a is again 
defined by a bound on measuring this orientation. If the vertices are truly isolated 
points, then e a = tt. In any event, our analysis provides estimates for Cjj both for 
the case of edge features and for the case of vertices. 

For the case of a rigid two-dimensional isometric transformation, we have char- 
acterized the volume of transformation space, Cjj that is consistent with a single 
data-model pairing (j,J)- This expression is given by equation (2) for edge features 
and equation (2a) for point features. The expression is a function of the noise in the 
data measurements, e p and c , and in the case of edges is further a function of the 
amount of occlusion, ctjj, and the length of the model edge, Lj. In the appendix we 
consider adding scaling to the transformation as well as the case of three-dimensional 
transformations. We now turn to the question of how these volumes interact. 

3. The Probability of a Conspiracy 

In the previous section, we characterized the volume of transformation space that 
is consistent with a data-model pairing. If two such volumes overlap, then their 
intersection defines the set of transformations that are consistent with both of the 
data-model pairings. Thus a correct match of a model to an image will lie in the 
intersection of several volumes. In this section we investigate the likelihood that I 
volumes in transformation space will intersect at random. Such an event corresponds 
to an arrangement of image features that happens to be consistent, within error 
bounds, with / of the model features, but which does not actually correspond to an 
instance of the object. 

The likelihood that / transformation space volumes will intersect at random is 
a function their number and size. The number of volumes depends on the number 
of model and image features. The size of each volume depends on the amount of 
noise in the data, the type of feature, and for edge features the amount of occlusion 
of the edges. In order to be confident that a match accounting for / model features 
is correct, we would like to choose I such that the likelihood of a random matching 
of that size is very small. 

In order to characterize the likelihood that several volumes will intersect at 
random we make use of a statistical occupancy model. In the discrete case, if r 
events are uniformly randomly distributed across n buckets, an occupancy model 
can be used to estimate the probability that a given bucket will contain k events. 
The events in our case are points in the volumes in transformation space, and the 
buckets are points in the transformation space itself. These events and buckets are 
continuous rather than discrete, and thus we are concerned with the limiting case 
as n, r — ► oo. 



The volume of transformation space denned by each incorrect model and image 
feature pairing is independent of the correct match. Furthermore, we assume that 
the image features are independent of one another. Thus we can model the volumes 
in transformation space as independent random events. The distribution of these 
volumes depends on the image features, which are unknown, so we assume the 
uniform distribution as an approximation. 

While the volumes in transformation space can reasonably be viewed as in- 
dependent random events, we are modeling the probability of events occurring at 
points in these volumes. As the number of volumes, R, gets large (compared with 
the ratio of the total size of the transformation space to the size of each volume, 
V/c) the overall distribution of points in the space is also random. For the cases of 
interest here Re > V, so the assumption of independent random pointwise events 
is a reasonable approximation. 

An alternative explanation of the independence of the pointwise events in trans- 
formation space is the following. The probability that a particular point is consistent 
with a given data-model pairing is equivalent to the probability that the point lies 
within some neighborhood of the centroid of the given volume in transformation 
space. Since the image features are assumed to be independently randomly dis- 
tributed, this probability is independent of the choice of image feature. Thus in the 
following analysis we assume that the events in transformation space are randomly 
distributed, and use the uniform distribution as an approximation. 

Given a uniform random distribution of r events into n cells such that each of 
the n r placements have equal probability, the probability that a given cell contains 
exactly k events is given by the binomial distribution 

In the limit, as n,r — ► oo, where the ratio £ — »■ A, the binomial distribution is 
approximated by 

^ -A 

Pk *Ti e • 

This distribution is often termed the Maxwell- Boltzmann statistic (for a standard 
reference see [Feller, 1968]). 

In addition to the Maxwell-Boltzmann distribution, another common distri- 
bution used in occupancy problems is the Bose-Einstein statistic, which has an 
experimental basis in particle physics. Under the Bose-Einstein model, for large r 
and n where - — »• A, the limiting case is the geometric distribution, where 
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This distribution has a long tail as k -> oo, and thus predicts large peaks with a 
higher probability than does the Maxwell-Boltzmann model. We are interested in 
establishing conservative bounds on the likelihood that a large number of volumes 
will intersect at random, thus we use the Bose-Einstein statistic because it provides 
a higher estimate of this likelihood. 



The parameter A of the occupancy model is the ratio of the occupied volumes 
of the transformation space to the total size of the transformation space. From 
equations (2) and (2a) in the previous section we know that each pair of model and 
image features defines a volume of size Cjj in transformation space. There are ms 
such volumes for m model features and s image features, so the occupied volume of 
the transformation space is given by 

s m 

The total size of the transformation space is just the product of the ranges for the 
dimensions of the space. Each rotational dimension ranges over the interval [0,27r], 
and each translational dimension ranges over [0,D], where D is the linear extent of 
the image. Thus in the case of a two-dimensional isometry (translation and rotation) 
we get 

_ j=i 2w=i C JJ 
2wD 2 
We can simplify this to 

A = msc 
where c is the average normalized volume size. 

In the case of two-dimensional edge features, from equation (2) we obtain the 
average normalized volume size 

__ 2e a [2e p aL + wep] 
C ~ 2^F 2 

where L is the average edge length, a is the average amount of occlusion of the 
edges (the average value of otjj), and where we have incorporated the normalizing 
term 2irD 2 . Note that as expected c increases as the noise e a ,e p increases, and also 
c increases as the average amount of occlusion of the edges a increases. 

For two-dimensional points features (with associated orientations), the average 
normalized volume size is given by 

__ 2e a -Kel 
C ~ 2nD 2 ' 
Note that we can restrict e < n and e p < y. In the extreme case, this can lead to 
c > 1, which does not make physical sense. We should really take the minimum of 
the above expressions and 1, but in practice c is usually much smaller than this and 
hence we ignore this special case. 

A particular recognition task thus defines a value for A, based on the type of 
transformation from the model to the image, the type of features, the number of 
model features m, the number of data features s, and a bound on the positional and 
angular error, e p and e a . 

Given a value for A, we are interested in the probability that / or more of the 
volumes intersect at random, which is given by 

i-i 
Pr{e>l} = l-^2p k . 
k=o 
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This corresponds to an arrangement of data features occurring at random that such 
that / of the model features can be matched (within the error bounds) to those data 
features. From Pr{e > 1} we can determine the fraction of model features, /o, such 
that the probability of ra/o features being matched at random is less than some 
predefined level, 6. This value is just the smallest / such that 

Pr{e > mf} < 6 

i.e. 

/o = min{/|Pr{e>m/}<£}. 
We then choose S such that the probability of a false match is small, for example 
6 = .001. 

The analysis in this section simply counts each pairing of a data feature with 
a model feature equally. It is also possible to weight the events by the amount of 
model accounted for. Below we consider the case of weighting each feature match 
by the length of the matched edge. 

4. Deriving Formal Thresholds 

We have used an occupancy model to determine an expression for the probability 
that / or more volumes in transformation space will intersect at random. This 
expression is a function of the number of features, the type of features, and bounds 
on the sensor error. The expression was then used to set a threshold, /o, on the 
fraction of model features that must be matched in order to limit the probability of 
a random matching to some level. In this section we derive a closed-form expression 
for / . 

Under Bose- Einstein statistics, we have 

A fc 
Pfc *(l + A)*+i 
or equivalently 



Pk 



G^y 



l + A 

The probability that there will be / or more events occurring at a point is given by 

(-1 
Pr{e>l} = l- J^Pk- 

fc=0 

We are interested in finding a threshold for distinguishing correct from random 
interpretations. This can be done by setting the threshold, /o, to be the fraction 
of model features such that / = m/o. If we choose a value 6 for the probability 
that there will be m/o or more events occurring at random (i.e. a limit on the false 
positive rate), then the condition on /o is given by 

m/o-l 

i- Yl p^ 6 - 

k=0 
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Substituting for pk yields 



m/o— 1 






and using the geometric series relationship yields 

m/o 



1- 



1 + A 



Gfr) - 



T+T 



-1 
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We can isolate /o by appropriate algebra: 



/o> 



log(i) 



mlog (1 + $) 



(3) 



where 



A = msc. 



The value of c depends on the particular type of feature being matched and 
the bounds on the sensor error. In the case of two dimensional edge fragments 
considered above, we derived 

__ 2e a [2c p aL + ire 2 p ] _ 



c — 



2wD 2 
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Note that equation (3) exhibits expected behavior. If the noise in the data 
increases, then c increases, and so does the bound on /o- Similarly, as the amount 
of occlusion increases, then so does c and thus the bound on /o- As either mors 
increases so does the bound on /o, and as 6 decreases /o increases. 

Also note that for large values of ms, the logarithm in the denominator can be 
approximated by its first order term, and one gets the following approximation 



/o> 



logi 



sc. 



Thus, in the limit, the bound on the fraction of the model is linear in the number of 
sensory features, linear in the average size of the volumes in transformation space, 
and varies logarithmically with the inverse probability of a false match. 

The expression for /o in equation (3) can yield values that are greater than 1.0, 
which makes no sense as a fraction of the model features. When /o is greater than 
1.0 it means that for the given number and type of features, and the given bounds 
on sensor error, it is not possible to limit the probability of a false match to the 
chosen 6 (even if all the model features are matched to some sensor feature). 

Thus to obtain a value for the fraction of model features that must be matched 
in order to limit the probability of a random conspiracy to 6, we simply need to com- 
pute c for the particular parameters of our recognition task, and then use equation 
(3) to compute /o . 

There are several possible choices for 6. One could simply set 6 to be some 
small number, e.g. 6 = .001 so that a false positive is likely to arise no more than 
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one time in a thousand. One could also set 6 as a function of the scene complexity, 
e.g. some multiple of the inverse of the total number of data model pairings 

A, 

where fi is an arbitrarily chosen constant. 

A third possibility is to set 6 so that the likelihood of a false positive, integrated 
over the entire transformation space, is small (e.g. less than 1). The idea is to 
determine the appropriate value of S such that the expectation is that no random 
matches will occur. If we let v be a measure of the sensitivity of our system in 
distinguishing transformations, then we could choose 6 as 

6= 2ttD 2 ' 
For example, we could set v to be a function of the noise in the data measurements, 

given by the uncertainty in orientation times the uncertainty in position: (2e a )(ir€p). 

In this particular case, we get 

log (O 
Jo > — : — 77- — TT- ^ 6a ) 

To illustrate the values for /o, we graph representative examples in Figures 3-5. 
Figure 3 displays graphs of /o as a function of 5, with m = 32, c = .0002215 (these 
numbers are taken from the recognition systems analyzed in section 5). Each graph 
is for a different value of S. Note that as s gets large, the graphs become linear, as 
expected. 

Figure 4 displays /o as a function of m for different values of 6. Here, 5 = 
100, c = .0002215. Note that as expected, when m becomes large, /o becomes a 
constant independent of m. 

Figure 5 displays /o as a function of the sensor error, for different values of S. 
Here, s — 100, m = 32. The percentage of error along the horizontal axis p is used 
to define sensing errors of e a = px and e v = pL. As expected, the threshold on /o 
increases with increasing error. 

Allowing for weighted votes 

The preceding analysis treated each data-model feature pairing equally, and bounded 
the probability that / such pairings would be consistent at random. Another ap- 
proach is to weight the contribution of each data-model pairing by some measure. 
One common scheme is to use the size of each data feature as a weight. In the case 
of two dimensional edges, for example, a data-model pairing (j, J) would carry a 
weight of tj (the length of the data edge), so that transformations consistent with 
pairings of long data edges to model edges would be more highly valued than those 
involving short data edges. 

We can modify our preceding analysis to handle this case as well. Note that the 
parameter A essentially measures the average "vote" at each point in the transforma- 
tion space. Since we have assumed that each volume of transformations consistent 
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f as function of s 
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Figure 3. Graphs on bounds on threshold, /o is graphed as a function of s, with other 
parameters fixed. The three graphs are for 6 = .0001, .001, .01 from top to bottom respec- 
tively. 



f as function of m 
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Figure 4. Graphs on bounds on threshold, /o is graphed as a function of m, with other 
parameters fixed. The three graphs are for 8 — .0001, .001, .01 from top to bottom respec- 
tively. 



with some data- model feature pairing is independent, we can derive the expected 
weighted "vote" at any point in transformation space. As one might expect, due to 
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f as function of error 
f 
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Figure 5. Graphs on bounds on threshold, /o is graphed as a function of error, with 
other parameters fixed. The three graphs are for 6 — .0001, .001,. 01 from top to bottom 
respectively. The percentage of error along the horizontal axis p is used to define sensing 
errors of e a = px and e p = pL. 



the independence, this simply yields 

A = ms£c 

where J is the average length of the data edges. Note that this is the average length 
over all data edges, not just those that match the object. 
In this case we are interested in bounds on /o such that 

mLfo—l 

1- 2 Pk ^ S - 
k=0 

Working through the same algebra as in the previous section leads to the following 
bound: 



/o> — 



^ lo s( 1 + ^i)' 



(4) 



5. Some Real World Examples 

To demonstrate the utility of our method, in this section we analyze some working 
recognition systems that utilize a threshold on the fraction of model features which 
must be accounted for by a match. We find that the analysis predicts thresholds 
that are close to those that were determined experimentally. This suggests that 
the technique can be profitably used to analytically determine thresholds for model- 
based matching. Because our analysis shows that the proper threshold varies with 
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the number of model and data features, it is important to be able to set the threshold 
as a function of a particular matching problem rather than setting it once based on 
experimentation. 

As a first example, we consider the application of the interpretation tree method 
[Grimson and Lozano-Perez, 1984, 1987; Ettinger, 1987, 1988; Murray, 1987a, 1987b; 
Murray and Cook, 1988] to recognizing sets of two dimensional parts. In this ap- 
proach, a tree of possible matching model and image features is constructed. Each 
level of the tree corresponds to one of the image features. At every node of the tree 
there is a branch corresponding to each of the model features, plus a special branch 
that accounts for model features that do not match the image. A path from the 
root to a leaf node maps each image feature onto some model feature or the spe- 
cial "no-match" symbol. The tree is searched by maintaining pairwise consistency 
among the nodes along a path. Consistency is checked using distance and angle 
relations between the model and image features specified the nodes. If a given node 
is inconsistent with any node along the path to the root then the subtree below that 
point is pruned from further consideration. 

A consistent path from the root to a leaf that accounts for more than some 
fraction of the model features is accepted as a correct match. This threshold is chosen 
experimentally. In our analysis of thresholds for the interpretation tree method, 
we use the parameters for the objects demonstrated in [Grimson and Lozano-Perez 
1987], and the parameters for a typical scene in the experimentation described there. 
These values are substituted into equation (2), and then a threshold /o is computed 
using equations (3) and (3a). 

In the experiments reported in [Grimson and Lozano-Perez, 1987], the following 
parameters hold: 

m = 32 

8 = 100 

I = 23.959 

€ p = 10 

IT 

We have computed c as a function of the amount of occlusion a, and then 
determined the corresponding threshold /o on the fraction of model features. Note 
that an occlusion of 1 represents the limiting case in which only a point on the line 
is visible. The results are given in Table 1. The first column of the table shows the 
values of /o computed using equation (3a). Recall that this method integrates over 
the transformation space in order to limit the expectation of a randomly occurring 
match by setting 

-=(— V 

For comparison, the second and third columns of the table are computed using 
equation (3), with the probability of a random match, 6, set to .001 and .0001, 
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Occlusion 


/, eqn (3a) 


/, with 8 = .001 


/, with 6 = .0001 


0.0 


0.225 


0.173 


0.230 


0.1 


0.244 


0.188 


0.250 


0.2 


0.263 


0.202 


0.270 


0.3 


0.282 


0.217 


0.289 


0.4 


0.301 


0.231 


0.308 


0.5 


0.319 


0.245 


0.327 


0.6 


0.337 


0.259 


0.346 


0.7 


0.355 


0.273 


0.364 


0.8 


0.374 


0.287 


0.383 


0.9 


0.392 


0.301 


0.401 


1.0 


0.409 


0.315 


0.420 



Table 1. Predicted bounds on termination threshold, as a function of the amount of 
occlusion, for trials of the RAF system. 



respectively. 

As expected, the bound on / increases as the amount of occlusion increases. 
Note that this bound is limited in scope even as the occlusion factor ranges over the 
entire possible range, that is, even for occlusions ranging from none to all 1, the 
bound on / only varies over a range of 0.225 to 0.409. It is interesting to compare 
these results with empirical observations. Grimson and Lozano-Perez report that in 
running the RAF system on a variety of images of this type using thresholds of / = .4 
resulted in no observed false positives, while using thresholds of / = .25 would often 
result in a few false positives. Since on average the occlusion was roughly .5, this 
observation fits nicely with the predictions of Table 1, namely that a threshold of .4 
should yield no errors, while a threshold of .25 cannot guarantee such success. 

If we use the lengths of the data features to weight the individual feature match- 
ings then substituting into equation (4) leads to the predictions shown in Table 2. 
These values were computed using equation (3a) in the same manner as the first 
column of Table 1. Again, this agrees with empirical experience for the RAF system, 
in which weighted matching using thresholds of / = .25 almost always led to no 
false positives, while using thresholds of / = .10 would often result in a few false 
positives. 

As a second example, we consider the HYPER system of Ayache and Faugeras 
[1986]. Similar to RAF, HYPER also uses geometric constraints to find matches of 
data to models. An initial match between a long data edge and a corresponding 
model edge is used to estimate the transformation from model coordinates to data 
coordinates. This estimate is then used to predict a range of possible positions 
for unmatched model features, and the image is searched over this range for po- 
tential matches. Each potential match is evaluated using position and orientation 
constraints, and the best match within error bounds is added to the current inter- 
pretation. The additional model-data match is used to refine the estimate of the 
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Occlusion 


/ with £ = L 


/ with I = .75X 


/ with I = .51 


0.0 


0.119 


0.091 


0.062 


0.1 


0.136 


0.103 


0.071 


0.2 


0.153 


0.116 


0.079 


0.3 


0.171 


0.129 


0.088 


0.4 


0.188 


0.142 


0.097 


0.5 


0.205 


0.155 


0.105 


0.6 


0.222 


0.168 


0.114 


0.7 


0.240 


0.181 


0.123 


0.8 


0.257 


0.194 


0.131 


0.9 


0.274 


0.207 


0.140 



Table 2. Predicted bounds on termination threshold, as a function of the amount of 
occlusion, for trials of the RAF system. In this case, the lengths of the matched edges is 
used, instead of just the number of matched edges. 

transformation, and the process is iterated. 

Although not all of the parameters needed to apply our analysis are given in 
the paper, we can estimate many of them from the illustrations provided in the 
article. Given several estimates for the error in the measurements, a range of values 
for the threshold / are listed in Table 3. Object-1 and Object-2 refer to the object 
labels used by Ayache and Faugeras. In these examples, we use orientational errors 
of e a — 7r/10 and 7r/15 radians and positional errors of e v = 3 pixels. 

In HYPER, a threshold of .25 is used to discard false positives, and Ayache and 
Faugeras report the observation of no false positives during a series of experiments 
with their system. For the two objects listed in Table 3, Ayache and Faugeras 
report that their system found interpretations of the data accounting for a fraction 
of .55 of the model for Object-1 and accounting for a fraction of .40 of the model for 
Object-2. Both these observations are in agreement with the thresholds predicted 
in Table 3, for different estimates of the data error. 

Thus for two different recognition systems (RAF and HYPER), using both weighted 
and unweighted matching schemes, we see that the technique developed in this paper 
yields matching thresholds that are similar to those determined experimentally by 
the designers of the systems. 

6. Conclusion 



In order to determine what constitutes an acceptable match of a model to an image, 
most recognition systems use an empirically determined threshold on the fraction 
of model features that must be accounted for. In this paper we have developed a 
technique for analytically determining the fraction of model features /o that must 
be matched in order to limit the probability of a random conspiracy of the data to 
some level 6. This fraction /o is a function of the type of feature, the number of 
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Object- 1 


Object-2 


Occlusion 


/, («. = ft) 


/. (<• = ft) 


/. («. = ft) 


/> («. = ft) 


0.0 


0.224 


0.185 


0.206 


0.168 


0.1 


0.243 


0.199 


0.225 


0.181 


0.2 


0.261 


0.212 


0.243 


0.195 


0.3 


0.279 


0.225 


0.262 


0.208 


0.4 


0.297 


0.238 


0.280 


0.221 


0.5 


0.315 


0.251 


0.298 


0.234 


0.6 


0.333 


0.264 


0.316 


0.247 


0.7 


0.350 


0.277 


0.335 


0.260 


0.8 


0.368 


0.289 


0.353 


0.273 


0.9 


0.386 


0.302 


0.371 


0.285 



Table 3. Predicted bounds on termination threshold, as a function of the amount of 
occlusion, for trials of the HYPER system. The first two columns for / are Object- 1, the 
final two for Object-2. 

model features, ra, the number of sensor features, s and bounds on the translation 
error e p and the angular error e a of the sensor and feature detector. 

Our analysis shows that the proper threshold varies with the number of model 
and data features. A threshold that is appropriate for relatively few data features 
is not appropriate when there are many data features. Thus it is important to be 
able to set the threshold as a function of a particular matching problem, rather than 
setting a single threshold based on some experimentation. The technique developed 
in this paper provides a straightforward means of computing a matching threshold 
for the values of m and s found in a given recognition situation. 

We have applied the technique to two existing recognition systems, and found 
that the predicted thresholds are close to those that were determined experimentally. 
This suggests that the method can be profitably used to analytically determine 
thresholds for model-based matching systems. 

Appendix: Extending the method to other cases 

So far, we have demonstrated our method on the case of recognition of rigid two- 
dimensional objects from two-dimensional edges or vertices. We can readily extend 
our method to other cases as well. In general, equations (3) and (4) still hold, with 
the proviso that c changes as the problem changes. First we consider adding scal- 
ing to the two-dimensional transformation, and then we consider three-dimensional 
transformations. 



Objects that scale 

First, we consider the case in which a two-dimensional object is free to scale within 
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some predefined range. In this case, the space of possible transformations is four- 
dimensional, having two dimensions for translation parameters, one for rotation, 
and one for scale. 

In this case, the transformation from model coordinates to sensor coordinates 
may be represented by 

v s = sR e V M + V 
where Vm is a vector in model coordinates, Re is a rotation matrix corresponding to 
an angle of 6, s is a scale factor, Vo is a translation offset, and v 3 is the corresponding 
vector in sensor coordinates. 

As in the earlier case, if we consider the conditions on the transformation so 
that the endpoint of a data edge corresponds to the endpoint of a model edge, we 
find that 



m j - -fh = sR 6* 



Li*- 
Mj - -y Tj 



+ V 



We also have the condition that 
so that 



sLj > lj 



s > 



or if we allow for error in the measurements, that 



-C A ^Cr 



s > 



Hence, for each choice of s, the translation is 



Vo = m,- - sR 6m Mj + SL \ £j Re m Tj. 



Similarly, if the other endpoints align, we get 
Vo = mj-sRe m Mj 



sLj - 



■Re m Tj. 



Because any intermediate position is also acceptable, the set of translations 



consistent with matching model edge J to data edge j is given by 



'o(M = { 



m j -3R 0m Mj + 'yR 9m Tj 



76 



sLj — £j sLj — tj 



']} 



(5) 



Hence, matching model edge J to data edge j yields a set of points in transform 
space V, with a single value for the rotation parameter and a set of values for the 
translation, that correspond to a line of length sLj — tj, with orientation Re m Tj 
in the x-y plane, where s can range from 



s> 



■0 i £\Lv 



to some predefined maximum s^. 

To determine the full volume of transformation space consistent with a data- 
model feature pairing, we must allow for noise in the measurements. As in the non- 
scaled case, we can integrate over a range of orientations within e a of the computed 
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one, and we must also integrate along the line of translations denned above as s 
varies. This implies that the full volume is given by 



2e * / V- a . F ( 2e » ( sL i ~ ^) + ™p) ds 



which reduces to 



This is normalized by the total volume of transformation space 

2irD 2 (s h - 1) 
to yield 

ti-2e„ 



C s = ' ~ 



.( € p\ 2 , o € P ( s h Lj-lj € p \ 

*Vd) +2 D\ 2D "J) 



7T \ S h - 1 

For cases involving scale, we can substitute c s in place of c in the earlier analysis. 

Three dimensional case 

As a second extension, consider the problem of recognizing three-dimensional ob- 
jects from three-dimensional edges. In this case, the transformation space is six 
dimensional, with three dimensions for translational components, and three for ro- 
tational components. As in the previous cases, we must deduce an expression for c 
that holds in this case. 

We begin with the rotational parameters. Given the unit tangent vector of a 
model edge and of a data edge, there is a set of rotations that will consistently map 
the model tangent into the data tangent. The axis of rotation that will accomplish 
this lies anywhere in the great circle on the unit sphere equidistant from the two 
tangent vectors. For each such axis of rotation, there is a unique angle of rotation 
that will effect the mapping. When we allow for error, the data tangent is only 
known to within a cone of radius e a , and hence the great circle expands into a band 
of feasible axes of rotation. If we integrate out the volume of feasible rotations, we 
get 

p2tt 



/ 7 

Je=o J6=i 



sin <f> d(f> d9 = 4^-^/1 — z 2 a - 



To account for the translation, we find that an analysis similar to the two 
dimensional case holds. In particular, the set of feasible translations is a cylinder of 
radius e p of length aL, where a is the amount of occlusion of the edge, capped by 
two hemi-spheres of radius e p . Hence, the overall volume is given by 



Wl - el (aLxt? + 1*4) 
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If the linear range of values for each dimension of translation is D, then the normal- 
ized coefficient in the three dimensional case is given by 



c — 



H4+i®W 
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